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Abstract 

The implementation of a low power logic based asynchronous circuit with the help of power gated logic. In 
asynchronous power gated logic (APL) circuit, each pipeline stage was incorporated with efficient charge 
recovery logic (ECRL) gate; handshake controller and partial charge reuse (PCR) mechanism. The main 
objective was, to provide a new lower power solutions using power gating (PG) for very large scale integration 
(VLSI) designers. ECRL have the simplest structure and high energy efficiency which was used to implement the 
functional blocks of APL circuit. PG adopts two approaches, fine grain and coarse grain approach. The circuit 
based asynchronous with fine grain approach is called asynchronous fine grain power gated logic (AFPL) 
circuit and coarse grain approach is said to be asynchronous coarse grain power gated logic (ACPL) circuit. In 
the PCR mechanism, part of the charge on the output node of an ECRL gate was reused to charge the output 
node of another ECRL gate. This help to reducing the energy dissipation. Therefore, leakage power reduction 
should begin with power gated logic and PCR mechanism. To mitigate the area overhead of the AFPL circuit, 
coarse grain power gating technique have been developed. 

KEYWORDS: Power Gated Logic, Fine Grain Power Gating, Coarse Grain Power Gating &Partial Charge 
Reuse 

I. Introduction 

As memory capacity and chip size are continuously increasing, it has become clear that optimizing of 
efficient, low power computing devices is a critical issue. Power dissipation has evolved into an 
optimization objective due to the growing demand for portable devices as well as due to excessive 
heat generation in the high speed performance systems. The power dissipation can be broken down 
into two main categories such as dynamic power and static power dissipation. 

Power dissipation has become a critical parameter in nano scale CMOS VLSI design [6]. Dynamic 
power is dissipated by active parts and static power is dissipated by the inactive parts of the circuits. 
Dynamic power dissipation is caused by switching activities in CMOS circuits due to charging and 
discharging of capacitance. Dynamic power dissipation also caused by signal switching to the logic 
gates of the circuits rather than switching activities [3]. The leakage current is the main source of 
standby power dissipation. The major sources of leakage current include reverse biased PN junction 
current and sub threshold channel condition current. 

As the feature size shrinks, static power can constitute the total power consumption. There are various 
techniques for reducing static power dissipation in CMOS circuits. These techniques include gate 
sizing, clocking gating, power gating, transistor stacking [2], dual threshold CMOS [5], etc. Power 
gating is one of the most effective techniques for standby leakage reduction method [11], [12]. Power 
gating can be implemented in both synchronous circuits as well as asynchronous circuits. 
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For synchronous system, the operation of system is performed by a clock signal which controls all 
sequential cells. The clock signal carries no information but it provides synchronization to the system. 
So that it can be performed in a lock-step procedure. Therefore, its power dissipation is greater of the 
total power consumption. However, there are several optimize issues associated with power gated 
logic in synchronous circuits. Asynchronous circuits have been widely used in handshaking protocols 
for transferring data between neighbouring modules in the data path. The computation is achieved 
without any global circuit signal. Unlike a global clock signal, the switching of the request and 
acknowledge signal indicate the computation condition, and no useless switching activities. 
Therefore, the power dissipation of an asynchronous system is better than synchronous system. 
Asynchronous circuit [1] performs computation when it is active; in inactive mode have no dynamic 
dissipation but it suffer leakage dissipation. Power gated logic employing to reduce the static power 
dissipation. It can be implemented in fine grain and coarse grain manner. Power gating [13] is used by 
shutting off the current to blocks of the circuit, which are not in use. The power gating techniques 
reduce the power dissipation by introducing sleep transistor between power supply rails and transistor 
stacks [10]. Fine grained power [7], [8] implement adds own sleep transistor to every cell result in 
large area overhead as compared to coarse grained power gating. 

In the proposed work, asynchronous circuits are power gated in coarse grain manner. This low power 
logic [14] is called asynchronous coarse grain power gated logic (ACPL). In the past, the fine grain 
power gating has been associated with asynchronous circuit is called as asynchronous fine grain gated 
logic (AFPL). 

Fine grain power gating allows for better sizing of sleep transistor than coarse grained power gating. 
However, fine grained power gating suffers a large area overhead. To maximize the leakage reduction 
of the coarse grained power gating, the partial charge reuse (PCR) mechanism has to be adapted as 
well. 

II. Previous Work 

2.1 Power Gated Logic 

Power gating is a technique used to reduce power dissipation in the circuit, by shutting off the current 
to blocks that are not in use. Power gating uses a high voltage sleep transistor in series with the pull- 
up and/or pull-down network of a low thermal voltage. When the low voltage logic block is inactive, 
it turned off the sleep transistor, thus resulting in reduction of leakage current. 

Consider a sleep transistor is shown in Figure 1, which is used for power gating. A pMOS sleep 
transistor act as a header switch connects the power supply. An nMOS sleep transistor act as a footer 
switch connects ground. This helps ton conserve area and educe timing complexity caused by voltage 
drop across sleep transistor. 
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Figure 1. Power Gating 

If a sleep transistor is used in an isolated manner in power gated cells that power gating approach is 
called fine grained power gating. If a sleep transistor is shared by a power gated logic cells, such kind 
of power gating approach is called coarse grained power gating or cluster-based power gated logic. 
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2.2 AFPL 



In CMOS circuit, standby power dissipation can make up of as much as a third of total power. Several 
recent works have focussed on the leakage power reduction. One of the effective techniques is power 
gated logic. The power gated logic in fine grain manner, called asynchronous fine grain power gated 
logic (AFPL) circuit. An asynchronous circuit is made up of spontaneous modules, which is achieved 
by local handshake signal, request and acknowledgement and not by global clock signal. The 
handshake protocol used in the AFPL circuit is said to be four phase dual-rail protocol. Power gated 
logic is based on asynchronous system. 
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Figure 2. AFPL-PCR pipeline structure 

AFPL circuit is comprised of efficient charge recovery logic (ECRL) gate, handshake controller (HC) 
and adopts partial charge reuse (PCR) mechanism. Figure 2 shows a simple structure of AFPL circuit. 
When AFPL is combined with the PCR mechanism that denoted by AFPL-PCR, otherwise it is 
denoted by AFPL w/o PCR. 

Table 1 Idle Power Dissipation Comparison 



Idle Power Dissipation (nW) 


AFPL w/o PCR 


AFPL-PCR 


ECRL 


HC 


Total Power 


ECRL 


PCR 


HC 


Total Power (P2= 


(Al) 


(A2) 


(Pl= A1+A2) 


(Bl) 


(B2) 


(B3) 


B1+B2+B3) 


0.15 


156 


156 


0.40 


0.04 


133 


134 



For performance comparisons have implemented the eight-bit five-stage Kogge-Stone adder. Table 1 
shows an idle power dissipation comparison for AFPL w/o PCR and AFPL-PCR, implementing an 
eight-bit five-stage Kogge-Stone adder- when adder has no valid input. AFPL w/o PCR implemented 
can reduce static power dissipation by 83.1% and AFPL-PCR implemented can reduce static power 
dissipation by 85.5%. 

III. Modified Approach 

3.1 ACPL 

Asynchronous circuits employ local handshaking for transferring data between neighbouring modules, 
so they are data-driven. Asynchronous circuit can be implemented with coarse grained power gated 
logic. ACPL circuit try to overcome the disadvantages of fine grain power gated logic. ACPL adopts 
PCR mechanism is denoted by ACPL -PCR. ACPL comprised of ECRL gate and handshake controller 
(HC). Figure 3 shows the ACPL simple pipeline structure. ACPL incorporated with PCR unit and HC 
unit. The handshake protocol used in ACPL pipeline is the four phase dual rail protocol; there is no 
separate request signal to encode into data signals. Thus, the data stream flowing to the ACPL 
pipeline is a sequence of alternating valid tokens and empty tokens. In the data stream, there is always 
an empty token between two consecutive valid tokens in the data stream and vice versa. 
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Figure 3. ACPL pipeline structure 

3.2 ECRL Gate 

ECRL has the simplest structure and the best energy efficiency. ECRL [15] was chosen to implement 
the function blocks of AFPL and ACPL. ECRL adopts dual-rail data encoding; that is, each input to 
an ECRL gate requires both polarities to be represented, and each ECRL gate computes both a logic 
function and its complement [4] . 



supply clock 




Figure 4 (a) ECRL exclusive-OR gate (b) Operation phases for ECRL 

ECRL gate consists of PMOS loads and NMOS pull-down transistors. This structure needs 
differential inputs. Figure 4(a) shows the schematic of an exclusive OR gate. A differential structure is 
more adequate since the loading capacitance seen by clocks remain constant regardless of their input 
combination, which is important for the design of clock circuits. 

The ECRL gates in the AFPL pipeline acquire their power from the handshake controllers instead of 
the conventional fixed dc power supply. The operation cycle of an ECRL gate comprises four phases 
is shown in Figure 4(b). There is 1) Wait 2) Evaluate 3) Hold 4) Discharge. 

ECRL uses four-phase clocking too efficiently recover the charge delivered by the supply clock. Each 
clock is followed by the next clock with a 90 degree phase lag. So when the previous stage is in the 
hold phase, the next stage must evaluate logic values in the precharge and evaluation phase. The 
energy graph shows that energy is recovered as the supply clock goes down but not completely. This 
energy loss is very small compared to other logic families. The four -phase clocking is recommended 
for the effective energy saving. 

3.3 Handshake Controllers and PCR Mechanism 
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A handshake controller (HC) is comprised of a completion detector (CD), a C-element and an inverter 
[9]. The CD in HC is used to detect whether the input to stage represents a valid codeword or an 
empty codeword. The output of the CD transits from LOW to HIGH when the input to stage becomes 
a valid codeword, and transits from HIGH to LOW when the input to stage becomes an empty 
codeword. If the input consists of n-bit data, n pairs of wires are required to encode the input, and the 
associated CD can be implemented with a n-input C-element gate and n two-input OR gates. 




HCunit 



Figure 5. PCR unit and HC unit structure 

A structure of PCR unit and HC unit is shown in Figure 5. The C* -element is element for 
asynchronous circuits. C-denotes combinational circuit and C*- represents enhanced C-element. The 
C* -element offers the advantage that an ECRL gate achieve early discharging if its outputs are no 
longer required, without waiting for the next empty token to arrive at this stage. 

In the PCR unit, transistor M2 is used as a diode, which allows the current to flow only in the 
direction from Vpi to Vpi+2, and transistor Ml is used as a switch, which is turned on when charge 
reuse is activated.The structure of the C*-element is shown in the Figure 6. 




Figure 6. Enhanced C*-element used in AFPL-PCR and ACPL-PCR circuits. 

The C*-element in HC has three inputs- Ri, Ain and -Ain. The latter two of whichare complementary. 
Ri is the request signal from the CD in HC. After reset, Ri=0, Ain=0, and -Ain=l. The use of the C*- 
element makes it possible to synchronize the discharging of gate G with the evaluating of gate G2 and 
to have gate G enter the sleep mode early to futher reduce satic power dissipation. 
Power gating is effective method to reduce power dissipation. In proposed work, the modification can 
be done by using coarse grain power gated logic instead of fine grain power gated logic. In this 
technique is called asynchronous coarse grain power gated logic, in which power saving can be 
achieved high. 



http://www.ijsrms.com 



©IJSRMS 



pg. 15 



International Journal of Scientific Research and Management Studies (IJSRMS) 

ISSN: 2349-3771 Volume 1 Issue 1, pg: 11-18 



The sleep transistors can be implemented in a design in either "coarse-grain" or "fine-grain" power 
gating styles. In the fine grain implementation, the sleep transistor is inserted in every standard cell. A 
power gating control signal is added to switch on and off power supply to the cell. A weak pull- 
up/down transistor controlled by the sleep signal is added to prevent floating output when the cell is in 
sleep mode. This is necessary to prevent short circuit current in those active cells connected to the 
sleep cell due to floating inputs. The pull-up/down transistor remains in OFF state in normal operation 
mode. Only one isolation state is allow which is "1" in footer switch implementations and "0" in the 
header switch implementations. 

The main advantage of the "coarse-grain" power gating is that sleep transistors share charge or 
discharge current. Consequently, it is less sensitive than the "fine-grain" implementations. Also, the 
area overhead is significantly smaller due to charge sharing among the sleep transistors. Most power- 
gating designs prefer the "coarse-grain" sleep transistor implementation which incurs large area 
penalty and higher sensitivity. 

IV. Comparison Results 

4.1 Power Analysis Result- AFPL 

As given in Table 2, for the AFPL w/o PCR implementation can reduce power dissipation by 19.1%- 
32.0%, and the AFPL-PCR implementation can reduce power dissipation by 30.6- 55.3%. 

Table 2 Power Result for AFPL circuit 



Parameter 


Power Saving in 
AFPL w/o PCR 


Power Saving in 
AFPL - PCR 


Total power 
consumption 
(mW) 


819 


784 


I(mA) 


451 


432 


P(mW) 


812 


778 



4.2 Power Analysis Result -ACPL 

The modified circuit can be done in handshake controller (HC) unit. The coarse grain gating is 
implemented in enhanced C-element circuit. The total estimated power consumption is compared 
between fine grain and coarse grain power gating is given in Table 3. 

Table 3 Power Comparison Result 



Total Estimated Power Consumption 


Input data 
range 
(MHz) 


Fine Grain 
Gating (mW) 


Coarse Grain 
Gating(mW) 


100 


243 


155 


200 


453 


276 


300 


662 


397 


400 


872 


518 


500 


1081 


639 


600 


1291 


760 
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4.2 Area Analysis 

ACPL circuit was implemented to conserve area overhead. The area comparison between AFPL and 
ACPL implementation was given in Table 4. The utilization of devices was clearly shown. 

Table 4 Area Comparison of AFPL and ACPL Circuit 



Total accumulated area-AF] 


PL 


Number of slices 


390 


Number of slice Flip Flop 


134 


Number of 4-input LUT 


615 



Total accumulated area-ACPL 


Number of slices 


346 


Number of slice Flip Flop 


148 


Number of 4-input LUT 


557 



4.2 Timing Analysis 

Timing analysis plays a vital role in facilitating the fast and reasonably accurate measurement of 
circuit timing. The arrival time of a signal is the time elapsed for a signal to arrive at acertain point. 
The timing analysis between AFPL and ACPL circuit implementation was shown in Table 5. 

Table 5 Timing Analysis Of Power Gated Logic 



Parameter 


AFPL 


ACPL 


Maximum frequency 


278.66 MHz 


309.73 MHz 


Minimum input arrival time 


35.771 ns 


16.04 ns 


Maximum output required time 


33.658 ns 


10. 607 ns 


Combinational path delay 


42.70 ns 


8.376 ns 



V. Simulation Results 

To evaluate the effectiveness of the AFPL-PCR and ACPL-PCR, an eight-bit five stage Kogge Stone 
adder was implement for performance comparison. The simulation were performed using Xilinx. 
Figure 8 shows the simulation waveform for ACPL-PCR implementation with different input data 
range . 
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Figure 8. Simulation Waveform for ACPL-PCR Circuit 
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VI. Conclusion 

The ACPL-PCR implementation of the Kogge-Stone adder can reduce static power dissipation by 
90.3% when the adder has no valid inputs, and reduce power dissipation by 82.6%-93.0% when the 
input data rate ranges from 30 to 900 MHz. The AFPL-PCR pipeline uses the enhanced O-element in 
its handshake controllers such that an ECRL logic gate in the AFPL-PCR pipeline can enter the sleep 
mode early to reduce leakage dissipation once its output has been received by the downstream 
pipeline stage .Most power-gating designs prefer the "coarse-grain" sleep transistor implementation 
than the "fine -grain" implementation which conserve area. 
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